Individual word language models and the frequency approach
نویسندگان
چکیده
We present a new method of introducing domain knowledge into an n-gram language model. It is based on a combination of language models for individual word domains. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. When testing, significant words are extracted from a cache and their models are combined with a global language model. Different methods of combining the models are described; one new simple method based on combining frequencies rather than probabilities gives promising results and provides a relatively simple method of introducing domain information into an n-gram language model. A 32% reduction in language model perplexity over the standard 3-gram approach is obtained which is similar to results obtained with other more complex domain models.
منابع مشابه
Material Development and English for Academic Purposes Word Lists; a Reductionist Approach
Nagy (1988) states that vocabulary is a prerequisite factor in comprehension. Drawing upon a reductionist approach and having in mind the prospects for material development, this study aimed at creating an English for Academic Purposes Word List (EAPWL). The corpus of this study was compiled from a corpus containing 6479 pages of texts, 2,081,678 million tokens (running words) and 63825 types (...
متن کاملHigh- and Mid-Frequency Vocabulary Size as Predictors of Iranian University EFL Students’ Speaking Performance
Literature is replete with the studies focusing on the role of vocabulary knowledge in second language receptive skills. However, the relationship between the aspects of vocabulary knowledge and productive skills in general, and the speaking performance in particular has remained scanty in the related literature. This paper examined the relationship between knowledge of L2 vocabulary size at di...
متن کاملThe Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections
Assessment is considered as one of the fundamental elements in the field of foreign language acquisition. In order for communication take place, adequate number of vocabulary is needed to be known by the learners. The salient role of vocabulary in the field of foreign language acquisition resulted in the publication of several hundreds of papers and dozens of books. Due to the dominant role of ...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملUsing network science in the language sciences and clinic.
A number of variables—word frequency, word length—have long been known to influence language processing. This study briefly reviews the effects in speech perception and production of two more recently examined variables: phonotactic probability and neighbourhood density. It then describes a new approach to study language, network science, which is an interdisciplinary field drawing from mathema...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002